Identifying Literary Texts with Bigrams

نویسندگان

  • Andreas van Cranenburgh
  • Corina Koolen
چکیده

We study perceptions of literariness in a set of contemporary Dutch novels. Experiments with machine learning models show that it is possible to automatically distinguish novels that are seen as highly literary from those that are seen as less literary, using surprisingly simple textual features. The most discriminating features of our classification model indicate that genre might be a confounding factor, but a regression model shows that we can also explain variation between highly literary novels from less literary ones within genre.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Speakers and Listeners of Quoted Speech in Literary Works

We present the first study that evaluates both speaker and listener identification for direct speech in literary texts. Our approach consists of two steps: identification of speakers and listeners near the quotes, and dialogue chain segmentation. Evaluation results show that this approach outperforms a rule-based approach that is stateof-the-art on a corpus of literary texts.

متن کامل

Does the Phonology of L1 Show Up in L2 Texts?

The relative frequencies of character bigrams appear to contain much information for predicting the first language (L1) of the writer of a text in another language (L2). Tsur and Rappoport (2007) interpret this fact as evidence that word choice is dictated by the phonology of L1. In order to test their hypothesis, we design an algorithm to identify the most discriminative words and the correspo...

متن کامل

Multifractal analysis of sentence lengths in English literary texts

This paper presents analysis of 30 literary texts written in English by different authors. For each text, there were created time series representing length of sentences in words and analyzed its fractal properties using two methods of multifractal analysis: MFDFA and WTMM. Both methods showed that there are texts which can be considered multifractal in this representation but a majority of tex...

متن کامل

Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel

Characters form the focus of various studies of literary works, including social network analysis, archetype induction, and plot comparison. The recent rise in the computational modelling of literary works has produced a proportional rise in the demand for character-annotated literary corpora. However, automatically identifying characters is an open problem and there is low availability of lite...

متن کامل

The Effect of Authentic and Simplified Literary Texts on the Reading Comprehension of Iranian Advanced EFL Learners

The present quasi-experimental study mainly investigates the role of literature as input for reading comprehension in Iranian EFL classrooms. To be more exact, it investigates the effects of authentic and simplified literary texts on the reading comprehension of Iranian advanced EFL learners. The participants were 35 male and female Iranian EFL learners who were at advanced level, studying in a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015